CONFIDENTIAL CASE FILE

📁 Return to the Front

Evidence Against the Defendant

The histogram of survival time shows a right-skewed distribution: many victims live with the consequences of breast cancer for several years, while a subset experience earlier “fatal” outcomes. Coloring by vital status highlights that deaths cluster at shorter follow-up times, as we would expect when a dangerous criminal strikes early.

The Kaplan–Meier curves show how survival unfolds over time for each racial group. Early on, survival probabilities remain high, but the curves gradually step downward as more victims succumb. The attached risk table shows how many patients remain “under surveillance” at each time point, reminding the jury that the end of each curve is based on relatively few individuals.

Comparing curves across racial groups reveals apparent disparities in survival. Some groups show consistently lower survival probabilities at comparable times, suggesting that Breast Cancer does not strike all communities equally. These differences motivate treating race as a key covariate in our multivariable survival model to formally assess these disparities while adjusting for other clinical factors.

Putting the Defendant on the Stand: Cox Proportional Hazards Model

The Cox proportional hazards model allows us to quantify the impact of various clinical and demographic factors on survival time. The hazard ratios (HR) indicate how the risk of death changes with each covariate, holding all other factors constant. For example, an HR greater than 1 suggests increased risk, while an HR less than 1 indicates a protective effect.

The forest plot visualizes the hazard ratios and their 95% confidence intervals for each covariate in the Cox model. Covariates with confidence intervals that do not cross 1 are considered statistically significant predictors of survival. This visualization helps the jury quickly identify which factors have the most substantial impact on patient outcomes.

The Cox model results provide compelling evidence against the defendant, Breast Cancer. Several clinical and demographic factors significantly influence survival, highlighting the complex interplay of biology and social determinants in this crime. The jury must consider these findings carefully when deliberating on the guilt of the defendant.

Checking the Rules of the Court: Proportional Hazards Assumption

##                          chisq df       p
## age                     0.1051  1 0.74584
## race                    0.9508  2 0.62165
## marital_status          2.9355  4 0.56867
## t_stage                 0.2729  3 0.96505
## n_stage                 1.6048  2 0.44826
## differentiate           1.8576  3 0.60248
## tumor_size              0.9924  1 0.31915
## estrogen_status        29.3770  1 6.0e-08
## progesterone_status    32.3417  1 1.3e-08
## regional_node_examined  0.0104  1 0.91881
## regional_node_positive  0.0164  1 0.89818
## GLOBAL                 51.1608 20 0.00015

To be admissible, the Cox model must satisfy the proportional hazards (PH) assumption—that each covariate’s effect on the hazard is roughly constant over time. We assess this using Schoenfeld residual–based tests. Small p-values raise concerns about PH violation. Those for estrogen and progresterone status are below 0.05, which raises concern about the evidence.

Shaking the Evidence: Bootstrap Hazard Ratios (200 Resamples)

The bootstrap analysis provides robust estimates of hazard ratios and confidence intervals by resampling the data multiple times. This approach helps validate the stability of our Cox model findings, ensuring that the evidence against the defendant is not an artifact of random variation in the sample. The 95% bootstrap confidence were intervals offer additional assurance about the reliability of our estimates.

Can We Predict Who Will Be Attacked Next? 5-Fold Cross-Validation

The 5-fold cross-validation assesses the predictive performance of our Cox model by partitioning the data into training and testing sets. The concordance index (C-index) quantifies how well the model discriminates between patients with different survival outcomes. A C-index closer to 1 indicates excellent predictive ability, while a value around 0.5 suggests no better than random guessing. Our model’s mean C-index of 0.733 (SD 0.051) demonstrates its utility in predicting who is at highest risk of death.